Discovering Skylines of Subgroup Sets

نویسندگان

  • Matthijs van Leeuwen
  • Antti Ukkonen
چکیده

Many tasks in exploratory data mining aim to discover the top-k results with respect to a certain interestingness measure. Unfortunately, in practice top-k solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find diverse sets of high-quality patterns. However, a ‘perfect’ diverse top-k cannot possibly exist, since there is an inherent trade-off between quality and diversity. We argue that the best way to deal with the quality-diversity trade-off is to explicitly consider the Pareto front, or skyline, of non-dominated solutions, i.e. those solutions for which neither quality nor diversity can be improved without degrading the other quantity. In particular, we focus on k-pattern set mining in the context of Subgroup Discovery [6]. For this setting, we present two algorithms for the discovery of skylines; an exact algorithm and a levelwise heuristic. We evaluate the performance of the two proposed skyline algorithms, and the accuracy of the levelwise method. Furthermore, we show that the skylines can be used for the objective evaluation of subgroup set heuristics. Finally, we show characteristics of the obtained skylines, which reveal that different quality-diversity trade-offs result in clearly different subgroup sets. Hence, the discovery of skylines is an important step towards a better understanding of ‘diverse top-k’s’.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Relative Importance of Skyline Attributes

Querying databases with preferences is an important research problem. Among various approaches to querying with preferences, the skyline framework is one of the most popular. A well known deficiency of that framework is that all attributes are of the same importance in skyline preference relations. Consequently, the size of the results of skyline queries may grow exponentially with the number o...

متن کامل

SkyDB: Skyline Aware Query Evaluation Framework

In recent years much attention has been focused on evaluating skylines, however the existing techniques primarily focus on skyline algorithms over single sets. These techniques face two serious limitations, namely (1) they define skylines to work on a single set only, and (2), they treat skylines as an “add-on”, loosely integrated on top of the query plan. In this work, we investigate the evalu...

متن کامل

Probabilistic Skylines on Uncertain Data

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic...

متن کامل

Efficient Skyline Computation in MapReduce

Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...

متن کامل

A Study on Intuitionistic Fuzzy and Normal Fuzzy M-Subgroup, M-Homomorphism and ‎Isomorphism‎

In this paper, we introduce some properties of an intuitionistic normal fuzzy m-subgroup of m- group with m-homomorphism and isomorphism. We study he image, the pre-image and the inverse mapping of the intuitionistic normal fuzzy m-‎subgroups.‎

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013